##                _                           
## platform       x86_64-w64-mingw32          
## arch           x86_64                      
## os             mingw32                     
## system         x86_64, mingw32             
## status                                     
## major          3                           
## minor          4.3                         
## year           2017                        
## month          11                          
## day            30                          
## svn rev        73796                       
## language       R                           
## version.string R version 3.4.3 (2017-11-30)
## nickname       Kite-Eating Tree
urlfile<-'https://raw.githubusercontent.com/rit-public/HappyDB/master/happydb/data/cleaned_hm.csv'
hm_data <- read_csv(urlfile)
## Warning: NAs introduced by coercion

HappyDB is a corpus of 100,000 crowd-sourced happy moments via Amazon’s Mechanical Turk. You can read more about it on https://arxiv.org/abs/1801.07746

For this analysis, I wanted to focus on a personal curiosity - for peers within my age group (26-30), what brings them happiness? There have been myths that claim that girls mature or “reach adulthood” a few years earlier than boys.

Can we make some inferences from happy moments of millenials? How do males and females differ in this regard?

##       wid        original_hm        gender       marital      parenthood
##  Min.   :    1   Length:94340       f:38650   married:40890   n:58715   
##  1st Qu.:  403   Class :character   m:55690   single :53450   y:35625   
##  Median : 1099   Mode  :character                                       
##  Mean   : 2682                                                          
##  3rd Qu.: 3319                                                          
##  Max.   :13839                                                          
##                                                                         
##  reflection_period      age           country     
##  hours_24:46821    Min.   :17.00   USA    :73328  
##  months_3:47519    1st Qu.:25.00   IND    :16551  
##                    Median :30.00   VEN    :  546  
##                    Mean   :31.86   CAN    :  531  
##                    3rd Qu.:35.00   GBR    :  352  
##                    Max.   :95.00   (Other): 2885  
##                                    NA's   :  147  
##       ground_truth_category        predicted_category     text          
##  affection       : 4536     achievement     :31865    Length:94340      
##  achievement     : 4038     affection       :31969    Class :character  
##  bonding         : 1662     bonding         :10116    Mode  :character  
##  enjoy_the_moment: 1425     enjoy_the_moment:10467                      
##  leisure         : 1254     exercise        : 1146                      
##  (Other)         :  434     leisure         : 7071                      
##  NA's            :80991     nature          : 1706                      
##      count          peer_agegroup  
##  Min.   :  1.000   non-peer:65663  
##  1st Qu.:  3.000   peer    :28677  
##  Median :  5.000                   
##  Mean   :  6.168                   
##  3rd Qu.:  7.000                   
##  Max.   :509.000                   
## 
##       wid        original_hm        gender       marital      parenthood
##  Min.   :    2   Length:28677       f:10793   married:11092   n:19718   
##  1st Qu.:  354   Class :character   m:17884   single :17585   y: 8959   
##  Median :  925   Mode  :character                                       
##  Mean   : 2438                                                          
##  3rd Qu.: 2788                                                          
##  Max.   :13828                                                          
##                                                                         
##  reflection_period      age           country     
##  hours_24:14074    Min.   :26.00   USA    :20994  
##  months_3:14603    1st Qu.:27.00   IND    : 6378  
##                    Median :28.00   CAN    :  156  
##                    Mean   :27.98   PHL    :  129  
##                    3rd Qu.:29.00   VEN    :  102  
##                    Max.   :30.00   (Other):  894  
##                                    NA's   :   24  
##       ground_truth_category        predicted_category     text          
##  affection       : 1406     achievement     :9627     Length:28677      
##  achievement     : 1208     affection       :9232     Class :character  
##  leisure         :  552     bonding         :3200     Mode  :character  
##  bonding         :  530     enjoy_the_moment:3169                       
##  enjoy_the_moment:  476     exercise        : 411                       
##  (Other)         :  127     leisure         :2610                       
##  NA's            :24378     nature          : 428                       
##      count          peer_agegroup  
##  Min.   :  1.000   non-peer:    0  
##  1st Qu.:  3.000   peer    :28677  
##  Median :  5.000                   
##  Mean   :  6.357                   
##  3rd Qu.:  7.000                   
##  Max.   :509.000                   
## 

What words occur most frequently?

Friends, day, and time all feature heavily for both sexes, but females have “husband” as the #4 word, while for males it’s “played”; Wife does not appear until #10

## [1] "numeric"
## [1] "tbl_df"     "tbl"        "data.frame"
## Warning in mutate_impl(.data, dots): Unequal factor levels: coercing to
## character
## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector

## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector
## Selecting by tf_idf

What if we look at multiple words or words that occur together? (bigrams)

The #1 bi-gram for men aged 26-30 is… video games! Additionally, “played video” and “played games” appears as well.

## Warning in mutate_impl(.data, dots): Unequal factor levels: coercing to
## character
## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector

## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector
## Selecting by tf_idf

Summary: 1. While playing video games appears prominently among the happy moments of men aged 26-30, they draw similar happiness from promotions/achievements and moments of affection (dating their girlfriends or their wives giving birth) 2. For women, husbands, sons, and daughters come up more than I initially expected, whereas the word “boyfriend” occurs far less frequently